Explore And Summarize Data For Red Win By Faisal ALsaeed

##   X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1           7.4             0.70        0.00            1.9     0.076
## 2 2           7.8             0.88        0.00            2.6     0.098
## 3 3           7.8             0.76        0.04            2.3     0.092
## 4 4          11.2             0.28        0.56            1.9     0.075
## 5 5           7.4             0.70        0.00            1.9     0.076
## 6 6           7.4             0.66        0.00            1.8     0.075
##   free.sulfur.dioxide total.sulfur.dioxide density   pH sulphates alcohol
## 1                  11                   34  0.9978 3.51      0.56     9.4
## 2                  25                   67  0.9968 3.20      0.68     9.8
## 3                  15                   54  0.9970 3.26      0.65     9.8
## 4                  17                   60  0.9980 3.16      0.58     9.8
## 5                  11                   34  0.9978 3.51      0.56     9.4
## 6                  13                   40  0.9978 3.51      0.56     9.4
##   quality
## 1       5
## 2       5
## 3       5
## 4       6
## 5       5
## 6       5
## [1] 1599   13
## 'data.frame':    1599 obs. of  13 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...
## 'data.frame':    1599 obs. of  13 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...

The set of data contain 1599 rows and 13 variables and by looking to the dataset, I recognized the data contain multiple variables for red wine samples test, that variables look like measurement for quality of wine so by searching on the internet about those variable I came up with this information:
x: Is serial number for samples of test clearly there is no relation between this number and test samples aspects.
fixed.acidity: Acids are major wine constituents and contribute greatly to its taste. In fact, acids impart the sourness or tartness that is a fundamental feature in wine taste is continuous variable. volatile.acidity : Is serial number for samples of test clearly there is no relation between this number and test samples aspects is continuous variable
citric.acid : is one of three primary acids found in grapes and converted by the winemaking process. Grapes naturally have 0.1 to 0.7 grams per liter of citric acid, which is about 10% of all acids is continuous variable.

We need to check and explore the data using the plot to check if needs to wrangling and clean.

sapply(win, function(x) sum(is.na(x)))
##                    X        fixed.acidity     volatile.acidity 
##                    0                    0                    0 
##          citric.acid       residual.sugar            chlorides 
##                    0                    0                    0 
##  free.sulfur.dioxide total.sulfur.dioxide              density 
##                    0                    0                    0 
##                   pH            sulphates              alcohol 
##                    0                    0                    0 
##              quality       quality.rating 
##                    0                    0

we use sapply function to check if there is any missing data in our data set … so we find there is no missing data.

Univariate Plots Section

Univariate Plots Section

I find this cool function that help me to explore data with out outliers and then remove it so I am gona to check each variable to check outlier and then remove it.

fixed.acidity

outlierKD(win, fixed.acidity, "yes")

## Outliers identified: 49 nPropotion (%) of outliers: 3.2 nMean of the outliers: 13.29 nMean without removing outliers: 8.32 nMean if we remove outliers: 8.16 nOutliers successfully removed n
summary(win$fixed.acidity)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   4.600   7.100   7.900   8.163   9.100  12.300      49

by looking to histogram we recognize there is skewed to right with long tail the median that we got it here is 7.90 and by searching in the internet to find what is best range for acidaty for win is between 7.10 g/dm^3 and 9.20 g/dm^3 so we remove outlieres to get accuret result.

volatile.acidity

outlierKD(win, volatile.acidity, 'yes')

## Outliers identified: 19 nPropotion (%) of outliers: 1.2 nMean of the outliers: 1.13 nMean without removing outliers: 0.53 nMean if we remove outliers: 0.52 nOutliers successfully removed n
summary(win$volatile.acidity)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.1200  0.3900  0.5200  0.5206  0.6300  1.0100      19

by looking to distribution is sekewed right with some of outliers in the left the mean here is 0.5200 and we need remove those outliers.

citric.acid

outlierKD(win, citric.acid, 'yes')

## Outliers identified: 1 nPropotion (%) of outliers: 0.1 nMean of the outliers: 1 nMean without removing outliers: 0.27 nMean if we remove outliers: 0.27 nOutliers successfully removed n
summary(win$citric.acid)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.0900  0.2600  0.2705  0.4200  0.7900       1

the graph is a uniform distribution and is right sekewed but mostly right skewed there is outliers need to remove it.

residual.sugar

outlierKD(win, residual.sugar, 'yes')

## Outliers identified: 155 nPropotion (%) of outliers: 10.7 nMean of the outliers: 5.88 nMean without removing outliers: 2.54 nMean if we remove outliers: 2.18 nOutliers successfully removed n
summary(win$residual.sugar)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.90    1.90    2.10    2.18    2.50    3.65     155

The graph shows the right skewed distribution with a long tail there is many outliers need to remove.

chlorides

outlierKD(win, chlorides, 'yes')

## Outliers identified: 112 nPropotion (%) of outliers: 7.5 nMean of the outliers: 0.2 nMean without removing outliers: 0.09 nMean if we remove outliers: 0.08 nOutliers successfully removed n
summary(win$chlorides)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.04100 0.07000 0.07800 0.07876 0.08700 0.11900     112

is look like normal shape but there is long tail in the right

free.sulfur.dioxide

outlierKD(win, free.sulfur.dioxide, 'yes')

## Outliers identified: 30 nPropotion (%) of outliers: 1.9 nMean of the outliers: 51.9 nMean without removing outliers: 15.87 nMean if we remove outliers: 15.19 nOutliers successfully removed n
summary(win$free.sulfur.dioxide)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    7.00   13.00   15.19   21.00   42.00      30

is right skewed with long tail in the right there is big space betwwn 57 and 66 so we need remov outliers.

total.sulfur.dioxide

outlierKD(win, total.sulfur.dioxide, 'yes')

## Outliers identified: 55 nPropotion (%) of outliers: 3.6 nMean of the outliers: 143.89 nMean without removing outliers: 46.47 nMean if we remove outliers: 43 nOutliers successfully removed n
summary(win$total.sulfur.dioxide)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       6      21      37      43      59     122      55

The distribution of total sulfur dioxide is right skewed there is gap between 120 and 289 so we need remove those data

density

outlierKD(win, density, 'yes')

## Outliers identified: 45 nPropotion (%) of outliers: 2.9 nMean of the outliers: 1 nMean without removing outliers: 1 nMean if we remove outliers: 1 nOutliers successfully removed n
summary(win$density)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.9923  0.9956  0.9967  0.9967  0.9978  1.0010      45

the shape of density is look like billshabe distribution and most values fall between 0.997 and 1000 we remove the value that not between those vales

pH

outlierKD(win, pH, 'yes')

## Outliers identified: 35 nPropotion (%) of outliers: 2.2 nMean of the outliers: 3.42 nMean without removing outliers: 3.31 nMean if we remove outliers: 3.31 nOutliers successfully removed n
summary(win$pH)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   2.930   3.210   3.310   3.309   3.400   3.680      35

The histogram shape is billshape there i recognize there is outliers in this shape and need to remove

sulphates

outlierKD(win, sulphates, 'yes')

## Outliers identified: 59 nPropotion (%) of outliers: 3.8 nMean of the outliers: 1.23 nMean without removing outliers: 0.66 nMean if we remove outliers: 0.64 nOutliers successfully removed n
summary(win$sulphates)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.3300  0.5500  0.6200  0.6364  0.7100  0.9900      59

the shape is skewed reight with long tail most values fall between 0.3 and 1.0 so the rest of values are outliers

alcohol

outlierKD(win, alcohol, 'yes')

## Outliers identified: 13 nPropotion (%) of outliers: 0.8 nMean of the outliers: 13.91 nMean without removing outliers: 10.42 nMean if we remove outliers: 10.39 nOutliers successfully removed n
summary(win$alcohol)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    8.40    9.50   10.10   10.39   11.00   13.50      13

the distribution is right skewed.

Univariate Analysis

Tip: Now that you’ve completed your univariate explorations, it’s time to reflect on and summarize what you’ve found. Use the questions below to help you gather your observations and add your own if you have other thoughts!

What is the structure of your dataset?

The dataset contains 1599 rows and 12 main variables for red wine samples the variable seems like measurement or reading combination wine test for a most frequent of wine.

What is/are the main feature(s) of interest in your dataset?

Qulity varible is most interst feater in this analyses and we want to know if the chemical have effect on qulity or not.

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

by investigating and searching, i find fixed acidity, volatile.acidity, citric.acid, alcohol has a strong relation to quality of wine. ### Did you create any new variables from existing variables in the dataset?

No.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

Yes i did some of change by explore the data to find if tere is any outliers that may have any efficet on the result so i just remove outliers.

Bivariate Plots Section

we need used visulization of correleation matrix between variables that can tell us which is have better reletion between variables and qulity then we explore more about those variables.

Bivariate Plots Section

pairs matrix

library(corrplot)

cor(win[c(2:13)], use = "complete.obs") 
##                      fixed.acidity volatile.acidity  citric.acid
## fixed.acidity           1.00000000      -0.27609419  0.662847205
## volatile.acidity       -0.27609419       1.00000000 -0.629348130
## citric.acid             0.66284720      -0.62934813  1.000000000
## residual.sugar          0.23153189       0.02707711  0.154331333
## chlorides               0.19803533       0.11154318  0.072543374
## free.sulfur.dioxide    -0.15023023      -0.01661600 -0.071628176
## total.sulfur.dioxide   -0.08968068       0.10167389 -0.001034774
## density                 0.60707349       0.04331300  0.302618617
## pH                     -0.68487998       0.22979933 -0.476569561
## sulphates               0.16088057      -0.31734401  0.257355006
## alcohol                -0.03923170      -0.22226705  0.141529284
## quality                 0.11067147      -0.35221458  0.219897255
##                      residual.sugar   chlorides free.sulfur.dioxide
## fixed.acidity            0.23153189  0.19803533         -0.15023023
## volatile.acidity         0.02707711  0.11154318         -0.01661600
## citric.acid              0.15433133  0.07254337         -0.07162818
## residual.sugar           1.00000000  0.23349630          0.08455430
## chlorides                0.23349630  1.00000000          0.01365609
## free.sulfur.dioxide      0.08455430  0.01365609          1.00000000
## total.sulfur.dioxide     0.19267101  0.17465073          0.61943057
## density                  0.39374030  0.41236911         -0.02458102
## pH                      -0.05839857 -0.17530599          0.14442249
## sulphates                0.04474094 -0.08123725          0.10317636
## alcohol                  0.09903333 -0.30373254         -0.02068163
## quality                  0.01543850 -0.19227105         -0.00278825
##                      total.sulfur.dioxide     density          pH
## fixed.acidity                -0.089680682  0.60707349 -0.68487998
## volatile.acidity              0.101673892  0.04331300  0.22979933
## citric.acid                  -0.001034774  0.30261862 -0.47656956
## residual.sugar                0.192671014  0.39374030 -0.05839857
## chlorides                     0.174650727  0.41236911 -0.17530599
## free.sulfur.dioxide           0.619430566 -0.02458102  0.14442249
## total.sulfur.dioxide          1.000000000  0.14734867  0.01340722
## density                       0.147348673  1.00000000 -0.22290076
## pH                            0.013407221 -0.22290076  1.00000000
## sulphates                    -0.051739587  0.07025678  0.01266580
## alcohol                      -0.242881973 -0.54606155  0.11883252
## quality                      -0.198538582 -0.23633307 -0.07572362
##                        sulphates     alcohol     quality
## fixed.acidity         0.16088057 -0.03923170  0.11067147
## volatile.acidity     -0.31734401 -0.22226705 -0.35221458
## citric.acid           0.25735501  0.14152928  0.21989725
## residual.sugar        0.04474094  0.09903333  0.01543850
## chlorides            -0.08123725 -0.30373254 -0.19227105
## free.sulfur.dioxide   0.10317636 -0.02068163 -0.00278825
## total.sulfur.dioxide -0.05173959 -0.24288197 -0.19853858
## density               0.07025678 -0.54606155 -0.23633307
## pH                    0.01266580  0.11883252 -0.07572362
## sulphates             1.00000000  0.27249357  0.41535589
## alcohol               0.27249357  1.00000000  0.51137367
## quality               0.41535589  0.51137367  1.00000000
chart.Correlation(win[, c(2:13)], histogram=TRUE, pch=19)

I use this explor tools to check correletion and check the releation between variables I use corrplot function we need check which is have begist postive and smallest reletion cofficient between qulity and other variables so by looking to corplot we find alchol vs quality is 0.48, sulphates vs quality is 0.25 ,citric.acid vs quality = 0.23 , fixed.acidity vs quality is 0.12 there is also reletion between variables with each other like fixed.acidity vs density is 0.67 and citric.acid vs pH = -0.54 and fixed.acidity vs pH = -0.68

Heatmap matrix

bi_data <- win[, c(2:13)] 
data1 <- as.matrix(bi_data) 
col<-  colorRampPalette(c("blue", "red"))(100)
heatmap(data1, col = col)

I try here to use heatmap to find easly and be sure which is the bigest positive cofficent so I get not clear matrix couse margins too large.

alchol vs quality is 0.48

qplot(quality , alcohol ,  data = win, 
      color = factor(win$quality.rating),
      geom=c("boxplot", "smooth"))+ 
      geom_line(aes(group = 1),
                  stat = "summary",
                  fun.y = median,
                  color = "#E74C3C",
                  size = 1,
                  alpha = 0.8)
## Warning: Removed 13 rows containing non-finite values (stat_boxplot).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : at 5.995
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : radius 2.5e-05
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : all data on boundary of neighborhood. make span bigger
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 5.995
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 0.005
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 1
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.01
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : zero-width neighborhood. make span bigger
## Warning: Computation failed in `stat_smooth()`:
## NA/NaN/Inf in foreign function call (arg 5)
## Warning: Removed 13 rows containing non-finite values (stat_summary).

cor.test(win$quality,win$alcohol,method = 'pearson')
## 
##  Pearson's product-moment correlation
## 
## data:  win$quality and win$alcohol
## t = 21.252, df = 1584, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4318135 0.5084571
## sample estimates:
##       cor 
## 0.4710238
 by(win$alcohol, win$quality, median)
## win$quality: 3
## [1] 9.925
## -------------------------------------------------------- 
## win$quality: 4
## [1] 10
## -------------------------------------------------------- 
## win$quality: 5
## [1] NA
## -------------------------------------------------------- 
## win$quality: 6
## [1] NA
## -------------------------------------------------------- 
## win$quality: 7
## [1] NA
## -------------------------------------------------------- 
## win$quality: 8
## [1] NA

by looking to graph we can say boxplot of execelent is have higher alcohol so when alcohol is increse the qulity as well increse

citric.acid vs pH

ggplot(aes(x=citric.acid,y=pH), data = win)+geom_point()+ stat_smooth(colour='blue', span=0.2)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 35 rows containing non-finite values (stat_smooth).
## Warning: Removed 35 rows containing missing values (geom_point).

cor.test(win$citric.acid,win$pH,method = 'pearson')
## 
##  Pearson's product-moment correlation
## 
## data:  win$citric.acid and win$pH
## t = -25.071, df = 1562, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5700970 -0.4993589
## sample estimates:
##        cor 
## -0.5356671

fixed.acidity vs density

ggplot(aes(x=fixed.acidity,y=density), data = win)+
  geom_point()+ stat_smooth(colour='blue', span=0.2)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 80 rows containing non-finite values (stat_smooth).
## Warning: Removed 80 rows containing missing values (geom_point).

cor.test(win$fixed.acidity,win$density,method = 'pearson')
## 
##  Pearson's product-moment correlation
## 
## data:  win$fixed.acidity and win$density
## t = 29.358, df = 1517, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5688469 0.6330521
## sample estimates:
##       cor 
## 0.6019214

by looking to scatter plot we see there is reletion between density and pH the trends goes up thats mean density is increse and fixed.acidity also is increse

citric.acid vs pH

ggplot(aes(x=citric.acid,y=pH), data = win)+geom_point()+ stat_smooth(colour='blue', span=0.2)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 35 rows containing non-finite values (stat_smooth).
## Warning: Removed 35 rows containing missing values (geom_point).

cor.test(win$citric.acid,win$pH,method = 'pearson')
## 
##  Pearson's product-moment correlation
## 
## data:  win$citric.acid and win$pH
## t = -25.071, df = 1562, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5700970 -0.4993589
## sample estimates:
##        cor 
## -0.5356671

by looking to scatter plot we see there is reletion between citric.acid and pH the trends goes down thats mean citric.acid is increse while pH is decrese

fixed.acidity vs pH

ggplot(aes(x=fixed.acidity,y=pH), data = win)+geom_point()+ stat_smooth(colour='blue', span=0.2)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 79 rows containing non-finite values (stat_smooth).
## Warning: Removed 79 rows containing missing values (geom_point).

cor.test(win$fixed.acidity,win$pH,method = 'pearson')
## 
##  Pearson's product-moment correlation
## 
## data:  win$fixed.acidity and win$pH
## t = -33.063, df = 1518, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6753445 -0.6168234
## sample estimates:
##        cor 
## -0.6470359

by looking to scatter plot we see there is reletion between fixed.acidity and pH the trends goes up thats mean fixed.acidity is increasing while pH is decresing

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

I use this explor tools like as pairs matrix and heatmap to check correletion and check the releation between variables I use corrplot function becouse we need check which is have begist postive and smallest reletion cofficient between qulity and other variables so by looking to corplot we find alchol vs quality is 0.48, sulphates vs quality is 0.25 ,citric.acid vs quality = 0.23 , fixed.acidity vs quality is 0.12 there is also reletion between variables with each other like fixed.acidity vs density is 0.67 and citric.acid vs pH = -0.54 and fixed.acidity vs pH = -0.68 ### Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)? yes ### What was the strongest relationship you found?

it was between alcohol and quality it was have 0.48

Multivariate Plots Section

quality, fixed.acidity vs. density

ggplot(data=win, aes(x=fixed.acidity, y=density ,
                     color=win$quality.rating))+
  geom_point()+ geom_smooth(method='lm', se=FALSE)
## Warning: Removed 80 rows containing non-finite values (stat_smooth).
## Warning: Removed 80 rows containing missing values (geom_point).

by looking to scatterplot we can see there is reletion we can describe this as highe positive correlation between fixed.acidity and density

quality, citric.acid vs. pH

ggplot(data=win, aes(x=citric.acid, y=pH , 
                     color=win$quality.rating))+
  geom_point()+ geom_smooth(method='lm', se=FALSE)
## Warning: Removed 35 rows containing non-finite values (stat_smooth).
## Warning: Removed 35 rows containing missing values (geom_point).

by looking to scatterplot we can see there is reletion we can describe this as high negative correlation between citric.acid and pH there is outlier

quality, citric.acid vs. volatile.acidity

ggplot(data=win, aes(x=citric.acid, y=volatile.acidity , color=win$quality.rating))+
  geom_point()+ geom_smooth(method='lm', se=FALSE)
## Warning: Removed 20 rows containing non-finite values (stat_smooth).
## Warning: Removed 20 rows containing missing values (geom_point).

by looking to scatterplot we can see there is reletion we can describe this as high negative correlation between citric.acid and volatile.acidity there is some ouliers values

quality, chlorides vs. density

ggplot(data=win, aes(x=chlorides, y=density , 
                     color=win$quality.rating))+
  geom_point()+ geom_smooth(method='lm', se=FALSE) 
## Warning: Removed 149 rows containing non-finite values (stat_smooth).
## Warning: Removed 149 rows containing missing values (geom_point).

it is seem not correlated among the variables also the trend of win qulity is vary

quality, fixed.acidity vs. citric.acid

ggplot(data=win, aes(x=fixed.acidity, y=density , 
                     color=win$quality.rating))+
  geom_point()+ geom_smooth(method='lm', se=FALSE) 
## Warning: Removed 80 rows containing non-finite values (stat_smooth).
## Warning: Removed 80 rows containing missing values (geom_point).

by looking to scatterplot we can see there is reletion we can describe this as high positive correlation between fixed.acidity and density there is some ouliers values

quality, residual sugar vs. free sulphur dioxide

ggplot(data=win, aes(x=residual.sugar
      , y=free.sulfur.dioxide , color=win$quality.rating))+
        geom_point()+ geom_smooth(method='lm', se=FALSE) 
## Warning: Removed 171 rows containing non-finite values (stat_smooth).
## Warning: Removed 171 rows containing missing values (geom_point).

there is no corrlateion between variables also the trind of qulity is varying

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

I talk about each reletion above. ### Were there any interesting or surprising interactions between features? I can’t see any interisted between features ### OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.


Final Plots and Summary

Plot One

## Warning: Removed 13 rows containing non-finite values (stat_boxplot).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : at 5.995
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : radius 2.5e-05
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : all data on boundary of neighborhood. make span bigger
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 5.995
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 0.005
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 1
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.01
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : zero-width neighborhood. make span bigger
## Warning: Computation failed in `stat_smooth()`:
## NA/NaN/Inf in foreign function call (arg 5)
## Warning: Removed 13 rows containing non-finite values (stat_summary).

Description One

by looking to graph we can saythe boxplot of execelent is have higher alcohol so when alcohol is increse the qulity as well increse also the mean trend of qulity is increse

Plot Two

## Warning: Removed 80 rows containing non-finite values (stat_smooth).
## Warning: Removed 80 rows containing missing values (geom_point).

Description Two

by looking to scatterplot we can see there is reletion we can describe this as highe positive correlation between fixed.acidity and density

Plot Three

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 80 rows containing non-finite values (stat_smooth).
## Warning: Removed 80 rows containing missing values (geom_point).

Description Three

Reflection

At the beginning of the project and when selecting the bytes through the data set list, I hesitated to choose this project since I have no chemical experience in knowing the effect of the components on each other, but because I love the challenges I decided to read more about this topic and decided to begin analyzing the data I encountered some The difficulties as knowing the details of the wine components were red but the subject was interesting. In this project I learned about many properties and their effect on the taste and quality of wine. I drew many conclusions and applied the lessons I learned about relationships and trends and applied many tools.

RESOURCES 1-http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software 2-http://ggplot.yhathq.com/ 3-https://datascienceplus.com/identify-describe-plot-and-removing-the-outliers-from-the-dataset/ 4-https://en.wikipedia.org/wiki/Acids_in_wine 5-https://mste.illinois.edu/courses/ci330ms/youtsey/scatterinfo.html 6-http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software